Elements of knowledge-free and unsupervised lexical acquisition
نویسنده
چکیده
attributes that can be seen as equivalent to the semantic primitives except that these abstract attributes do not automatically receive names similar to those that can be found in manually created semantic primitive collections. In order to clarify this analogy it is feasible to make an example for the word leap. Apart from the initial attributes like ‘frog’ and ‘leg’ that will emerge from an initial analysis of a corpus where leap frequently occurs, other abstract attributes could emerge from clustering. The would result in leap becoming part of one or more clusters resembling abstract attributes. The first one, for example, could be a cluster built from the words leap, jump, run and go. This cluster could thus be seen as representing the semantic primitive move. The second cluster could comprise the words leap, big, huge, gigantic and therefore represent the semantic primitive large. A third cluster could possibly be built from the words leap, sudden, fast, jerky and represent the primitive sudden. The ‘meaning’ of the word leap could then be represented either by the semantic primitives large sudden move, which is only interesting for humans, or by the three clusters representing the attributes, which is something applications can make use of. The fuzzyness is another seemingly problematic feature of semantic primitives, because they are commonly defined to either apply (eventually positively versus negatively) or not, without fuzzyness. The meaning of the word frog therefore clearly possesses the primitive +living. However, as described in Section 1.3, this model aims at being able to analyze the structure of language, not the relation between parts of the structure and the real world. Also, whether +living is correct for the word frog depends on its usage it might have been used as a name or a concept for something. In these cases, +living might still be slightly applicable, but not as clearly as before. Further phenomena commonly encountered are linguistic relations such as antonymy, synonymy and hypernomy, which are not dependent upon the values of specific semantic categories. These abstract semantic relations are important for many NLP applications, and traditionally have been manually encoded in lexicalsemantic structures such as WordNet (Miller, 1990; Fellbaum, 1998), GermaNet (Hamp and Feldweg, 1997; Kunze and Wagner, 1999), and EuroWordNet (Vossen, 1998). One important application of these resources is to infer other relations, cf. (Richardson, 1997). In WordNet, the collection of semantic relations includes: • Hyperonymy and hyponymy also sometimes called the is-a-relation. It holds between two ‘concepts’ (which are sets of words) whenever one has a more abstract meaning than the other. • Two words are meronyms, whenever one denotes something that is part of the other word’s denotation. In WordNet, meronyms are split into several types: part-of, member-of and substance. The differences between these types depend on the type of the denotation (countable, fluid, etc.).
منابع مشابه
Semi-automatic Acquisition of Lexical Resources and Grammars for Event Extraction in Bulgarian and Czech
In this paper we present a semi-automatic approach for acqusition of lexico-syntactic knowledge for event extraction in two Slavic languages, namely Bulgarian and Czech. The method uses several weaklysupervised and unsupervised algorithms, based on distributional semantics. Moreover, an intervention from a language expert is envisaged on different steps in the learning procedure, which increase...
متن کاملA Graph Model for Unsupervised Lexical Acquisition
This paper presents an unsupervised method for assembling semantic knowledge from a part-ofspeech tagged corpus using graph algorithms. The graph model is built by linking pairs of words which participate in particular syntactic relationships. We focus on the symmetric relationship between pairs of nouns which occur together in lists. An incremental cluster-building algorithm using this part of...
متن کاملThe production of lexical categories (VP) and functional categories (copula) at the initial stage of child L2 acquisition
This is a longitudinal case study of two Farsi-speaking children learning English: ‘Bernard’ and ‘Melissa’, who were 7;4 and 8;4 at the start of data collection. The research deals with the initial state and further development in the child second language (L2) acquisition of syntax regarding the presence or absence of copula as a functional category, as well as the role and degree of L1 influe...
متن کاملThe Effect of Raising Morphological Decomposition Awareness on Lexical Knowledge of Complex English Words
Lexical knowledge of complex English words is an important part of language skills and crucial for fluent language use. This study aimed to assess the role of morphological decomposition awareness as a vocabulary learning strategy on learners’ productive and receptive recall and recognition of complex English words. University students majoring English at the...
متن کاملMultiobjective Optimization and Unsupervised Lexical Acquisition for Named Entity Recognition and Classification
In this paper, we investigate the utility of unsupervised lexical acquisition techniques to improve the quality of Named Entity Recognition and Classification (NERC) for the resource poor languages. As it is not a priori clear which unsupervised lexical acquisition techniques are useful for a particular task or language, careful feature selection is necessary. We treat feature selection as a mu...
متن کاملData-driven Paraphrasing and Stylistic Harmonization
This thesis proposal outlines the use of unsupervised data-driven methods for paraphrasing tasks. We motivate the development of knowledge-free methods at the guiding use case of multi-document summarization, which requires a domain-adaptable system for both the detection and generation of sentential paraphrases. First, we define a number of guiding research questions that will be addressed in ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007